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Preface 


With  the  steady  increase  in  complexity  of  equipment, 
in  the  stringency  of  operating  conditions  and  in  the  positive 
identification  of  system  effectiveness  requirements,  it  is 
becoming  harder  to  satisfy  the  requirements;  and  more  and 
more  emphasis  is  placed  on  preventive  maintenance  together 
with  the  speedy  repair  of  replicated  units  as  a  means  of 
achievement.  Test  equipment  is  also  becoming  more  and  more 
complex.  Maintainability  engineering  is,  therefore,  an 
attempt  to  achieve  seme  repair  time  objectives  by  specifying 
a  combination  of  design  and  human  factors  together  with  the 
maintenance  philosophy  consistent  with  the  other  engineering 
and  cost  constraints  which  exist. 

The  purpose  of  this  study  lo  to  present  the  confidence 
intervals  for  system  reliability  and  ava.'  ^ability  of  main¬ 
tained  system  using  Mont  Carlo  techniques.  The  current  lit¬ 
erature  indicates  that  Mont  Carlo  techniques  offer  the  only 
practical  method  of  analyzing  large  scale  systems  with  general 
failure  and  repair  systems.  It  is  hoped  that  this  kind  of 
study  will  serve  as  a  useful  introduction  to  the  various 
types  of  problems  which  are  of  current  interest. 
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ABSTRACT 


This  thesis  presents  the  results  of  an  extensive 
literature  search  for  finding  the  confidence  intervals  for 
system  reliability  and  availability  of  maintained  systems 
using  Mont  Carlo  techniques.  The  characteristics  of  system 
reliability  and  maintainability  analysis  are  discussed.  The 
basic  Markovian  failure  ana  repair  models  are  developed.  A 
summary  of  the  point  estimates  of  availability,  reliability 
of  maintained  systems,  and  the  exact  analytical  methods  are 
presented.  Finally,  the  Bootstrap  (Double  Mont  Carlo  tech¬ 
nique)  is  used  to  obtain  the  confidence  limits  for  the  avail¬ 
ability  of  the  maintained  systems. 


CONFIDENCE  INTERVALS  FOR  SYSTEM  RELIABILITY 
AND  AVAILABILITY  OF  MAINTAINED  SYSTEMS 
USING  MONT  CARLO  TECHNIQUES 


Introduction 


The  reliability  and  maintainability  disciplines,  as 
they  exist  today,  has  evolved  primarily  during  the  past  two 
decades.  The  impetus  for  these  disciplines  grew  largely  out 
of  military  and  space  programs  and  was  related  to  several 
principal  considerations: 

(a)  the  relatively  high  failure  rates  of  equipment 
during  the  19^0 's  and  early  1950 's; 

(b)  the  resulting  sharp  increase  in  the  cost  of  pro¬ 
curement  ana  maintenance; 

(c)  the  continual  Increase  in  total  parts  and  func¬ 
tional  complexity;  and 

(d)  the  recognised  need  to  have  an  effective  appproach 
to  exclude  or  minimize  the  conditions  that  contributed  to 


fauxty  equipment. 


In  the  evolution  of  the  reliability  and  rr.a 
icy  disciciines,  many  mathematical  methodologies 
oped  to  account  for  and  explain  the  observations 
or  repairs)  generated  by  various  equipments  or  sy 
consideration.  Since  the  observations  were  (and 
to  be ;  often  random  in  nature ,  e .  2  ,  stocas t i c  r 
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deterministic ,  probabilistic  concepts  were  applied  to  cor¬ 
rectly  analyse  the  particular  system.  Since  the  Mont  Carlo 
techniques  offer  the  best  and  valuable  method  of  analysing 
large  scale  systems  with  general  failure  and  repair  systems, 
these  techniques  are  used  to  calculate  the  confidence  inter¬ 
vals  for  system  reliability  and  availability  of  maintained 
systems . 

Reliability  is  a  yardstick  of  the  capability  of  an 
equipment  to  operate  without  failures  when  put  into  service. 
Reliability  predicts  mathematically  the  equipment’s  behavior 
under  expected  operating  conditions.  More  specifically,  re¬ 
liability  expresses  in  numbers  the  probability  an  equipment 
will  operate  without  failure  for  a  given  length  of  time  in 
an  environment  for  which  it  was  designed. 

Maintainability  is  a  more  widely  known  term.  Spe¬ 
cifically,  it  is  defined  as  the  probability  that  a  failed 
system,  can  be  made  operable  in  a  specified  interval  of  down¬ 
time.  Maintenance  actions  can  be  classified  into  two  cate¬ 
gories.  First,  there  is  cff-schedule  maintenance  necessi¬ 
tated  by  system  in-service  failure  or  malfunction.  Its  pur¬ 
pose  is  to  restore  system  operation  as  soon  as  possible  by 
replacing,  repairing,  or  adjusting  the  component  or  compo¬ 
nents  which  cause  interruption  of  service.  Second,  there  is 
a  scheduled  maintenance  at  regular  intervals;  its  purpose  in 
to  keen  the  system  ir.  a  condition  consistent  with  its  built- 


in  levels  of  performance,  reliability,  and,  where  applicable, 


I 


Availability  is  defined  as  the  probability  that  a  sys¬ 
tem  is  operating  satisfactorily  at  any  point  in  time.  The 
problem  is  to  calculate  the  confidence  intervals  for  system 
reliability  and  availability  of  maintained  systems  using 
Mont  Carlo  techniques. 

To  achieve  the  study  oojectives,  an  extensive  search 
was  made  of  the  available  literature.  The  results  of  this 
search  and  the  ensuing  study  are  reported  in  the  following 
sections:  Section  II  includes  a  survey  of  techniques  to 

study  reliability,  maintainability,  and  availability  of  main¬ 
tained  systems;  Section  III  contains  a  summary  of  point  esti¬ 
mates  of  availability  ar.d  reliability  of  maintained  systems; 
Section  IV  contains  the  confidence  limits  for  availabilities 
of  maintained  systems  -  exact  analytical  methods;  and  final¬ 
ly,  Section  V  contains  the  Kor.t  Carlo  comparisons  confidence 
limits  for  availability  and  reliability.  Appendix  A  contains 
a  list  of  major  reference  sources  -  that  is,  documents  which 
contained  a  majority  of  the  references  used  in  the  study. 
Appendix  3  is  a  reference  bibliography  containing  literature 
which  is  of  a  general  nature,  yet  pertinent  to  the  second 
objective  of  the  thesis.  This  information  has  been  included 
as  an  aid  to  the  reader  for  further  study.  Aopendix  C  is  a 
supplemental  bibliography  containing  two  types  of  references: 
those  applicable  to  the  thesis  but  noo  selected  for  detailed 
analysis  and  those  which  appear  tc  be  applicable  but  were  vet 
readily  obtainable. 
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II.  Survey  of  Techniques  to  Study  Reliability , 
Maintainability .  and  Availability  of 
Maintained  Systems 

Reliability  and  maintainability  engineering  activi¬ 
ties  are  performed  for  the  purpose  of  determining  and  improv¬ 
ing  the  reliability  and  maintainability  of  a  particular  sys¬ 
tem.  These  activities  often  involve  the  analysis  of  physical 
processes  which  have  probabilistic  transition  events,  in 
which  case  the  state  of  the  process  at  any  time  t  is  a  ran¬ 
dom  variable.  Reliability  and  maintainability  analyses,  there 
fore,  rely  heavily  upon  the  techniques  of  probability  theory. 


Reliability ,  Maintainability ,  and 
Availability  Defined 

Reliability  Is  the  probability  that  a  system  will 
perform  its  intended  function  satisfactorily  for  at  least  a 
given  time  period  under  specified  operating  conditions.  The 
probability  of  failure  as  a  function  of  time  can  be  defined 

by 

p(t  s  t)  =  F(t ) ,  t  2  0  (2-1) 

where  t  is  a  random  variable  denoting  the  failure  time.  Then 
R(t)  is  the  probability  that  the  system  will  fail  by  time  t. 
If  we  define  the  reliability  as  the  probability  of  success, 
cr  the  probability  that  the  system  will  perform  its  intended 
function  at  a  time  t,  we  can  write  (Ref  ^ 3 : 9 ) 


p(t 


,  ♦-  \ 

^  V  / 


-P  ) 


'4 


T?  (  «-  N 

i  U/ 


where  R(t)  is  the  reliability  function.  If  the  time  to  fail¬ 
ure  random  variable  t  has  a  density  function  f(t),  then 

t  ® 

R(t )  =  1  -  F ( t )  -  1  -  / f ( t ) 3 t  =  / f(t)3T  (2-3) 

0  t 

Maintainability  is  a  more  widely  known  term.  Spe¬ 
cifically,  maintainability  is  defined  as  the  probability  that 
a  failed  system  can  be  made  operable  in  a  specified  interval 
of  downtime.  Here  the  downtime  includes  the  total  time  that 
the  system  is  out  of  service.  Downtime  is  a  function  of  the 
failure  detection  time,  repair  time,  administrative  time, 
and  the  logistics  time  connected  with  the  repair  cycle. 
Theoretically,  for  a  product  there  exists  a  maintainability 
function.  The  maintainability  function  describes  probabilis¬ 
tically  how  long  a  system  remains  in  a  failed  state.  Mathe¬ 
matically 

?(t  S  t)  =  M(t)  ( 2-4 ) 


where  t  is  the  total  downtime. 

Availability  Is  defined  as  the  probability  that  a  sys¬ 
tem  is  operating  satisfactorily  at  ar.y  point  in  time  and  con¬ 
siders  only  operating  time  and  downtime,  I.e.,  excluding  the 


idle  time.  Availability  is  a  measure  of  the  ratio  of  the 


operating  time  of  the  system  to  the  operating  time  plus  down¬ 
time.  Thus  it  includes  both  reliability  and  maintainability. 

The  reliacility  and  maintainability  fields  have 
evolved  as  distinct  disciplines  since  1950.  3cth  rely  heav¬ 
ily  upon  mathematical  methodologies .  Since  they  are  concerned 


with  probabilistic  concepts,  many  well-established  mathemat¬ 
ical  concepts  and  techniques  are  applicable.  One  of  these 
concepts  is  the  theory  of  Markov  processes.  Many  of  the 
determinants  of  the  reliability  and  maintainability  of  a  sys¬ 
tem  are  random  processes  (sequence  of  states),  any  given 
state  of  which  is  dependent  only  upon  the  previous  state  in 
the  process.  Such  processes  are,  by  definition,  Markov  pro¬ 
cess  j  few  references  that  present  Markov  process  theroy  in¬ 
clude  applications  of  this  technique  to  reliability  and  main¬ 
tainability.  Thus  it  is  the  purpose  of  this  chapter  to  pro¬ 
vide  the  Markov  models  for  reliability  and  maintainability 
analysis . 

The  Basic  Markovian  Failure  Models 

The  basic  Markovian  failure  model  (Ref  30:18)  consists 
of  two  states  and  a  single  transition  event.  The  system  is 
in  state  0  if  it  is  operating  satisfactorily,  and  it  is  in 
state  1  if  it  has  failed.  The  transition  event  from  state  0 
to  state  1  is  a  system  failure.  Figure  1  shows  the  graph  of 
the  basic  Markovian  model.  At  any  point  in  time,  if  the  sys¬ 
tem  is  in  state  0,  it  can  either  fail  with  probability  p^. 
or  not  fall  with  probability  pQC).  System  failure,  state  1, 

is  considered  to  be  the  end  of  the  process.  Therefore,  p-. 

1  1 

may  either  be  ignored  or  set  equal  to  1. 

The  Markovian  properties  of  the  model  result  from  the 
dependence  of  the  one-step  transition  probabilities  on  only 
the  immediately  previous  state.  To  amplify,  the  probability 
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Graph  of  the  Basic  Markovian  Failure  Model  (from  Ref  30:18) 

of  the  occurrence  or  nonoccurrence  of  failure  in  the  time 
interval,  (t+At);  it  is  assumed  to  depend  only  upon  the  sys¬ 
tem  being  in  the  unfailea  state  at  time  t,  regardless  of  the 
past  history  of  the  system.  The  one-step  transition  proba¬ 
bilities  of  the  basic  failure  model  depend  upon  and  can  be 
derived  from  the  failure  characteristics  of  the  system  when 
it  is  in  state  0.  The  probability  of  a  system  failure  in  a 
small  time  t  is 


t  +  At 

/  f (x)dx 
t 

where  f(x)  is  the  probability  density  of  the  time  to  failure 
cf  the  system.  Since  failure  cannot  occur  unless  the  system 
is  operating,  the  conditional  probability  of  failure  in  At, 
given  the  system  has  not  failed  prior  to  t,  is 
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+At 

/  f (x)dx 
t 


(2-5) 


/  f  (x )dx 
t 


Equation  (2-5)  divided  by  At  is  the  system  failure  rate  (Ref 
3:50). 

The  hazard  function  is  defined  as  the  limit  of  the 
failure  rate  as  the  interval  approaches  zero.  Thus  the  haz¬ 
ard  function  is  the  instantaneous  failure  rate: 

-  /  \  _  f  (  t  )  f  ~  Ss 

n  \  Li )  j'  ^  C  2^ o  / 


also 


f(t) 


h(t)  exp 


-/h ( t ) 3  t 


(2-7) 


L  0  J 

Thus  f(t),  R(t),  and  h(t)  are  all  related  to  each  other  and 
any  one  implies  the  other  two  (Ref  ^3 •'!*). 

Assuming  stationary  one-step  transition  probabilities 
in  the  basic  failure  model,  ?01  =  (q01*At),  the  failure  rate 
of  the  system  is  (q^*At)/At  =  q^,,  and  h(t)  =  Letting 

Qq i  =  X,  if  the  time  to  failure  is  described  by  an  exponen¬ 
tial  density  function,  then 


i .  e .  , 


f(t)  «  Ae"U 

CD 

R(t)  =  / Ae- ^  T  3t 
t 


A 


/  .  %  “At 
n  \  u  /  =  e 


(2-8) 


(2-9) 
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and 


h(t)  =  -FTFT 

Thus  to  achieve  temporal  homogeneity  of  the  Markovian  fail¬ 
ure  model,  exponential  failure  times  are  usually  assumed. 

The  basic  Markovian  model  can  be  easily  extended  to 
include  system  states  other  than  total  operability  and  total 
failure.  Figure  2  is  the  graph  of  such  a  model. 


p00  P11  P22 


Fig.  2 

Graph  of  a  Three-State  Failure  Model  (from  Ref  30:20) 


State  0  and  2  are  "system  operating"  and  "system  failed," 
respectively.  State  1  is  any  state  of  system  degradation 
that  changed  the  failure  characteristics  of  the  system  but 
does  not  cause  total  system  failure.  The  transition  event 
from  state  0  to  state  1  is  some  form  of  system  deterioration, 
and  the  transition  events  0  -*■  2  and  1  -*■  2  are  the  two  differ¬ 
ent  modes  of  system  failure  that  can  occur.  Because  these 
three  transition  events  differ  physically,  it  is  usually  the 
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case  that  pQ1  *  pQ2  *  P12*  Extensions  of  these  basic  failure 
models  to  different  system  configurations  require  approximate 
addition  and  redefinition  of  the  possible  system  states  and 
their  associated  transition  probabilities. 

Application  of  the  Basic  Markovian 

Models  to  Reliability  (Ref  30:21).  System  reliability, 
R(t),  is  the  probability  that  the  system  does  not  fail  prior 
oo  time  t.  In  terms  of  the  basic  Markovian  failure  model, 
Figure  1,  R(t)  is  the  probability  that  the  process  does  not 
progress  to  state  i  prior  tu  time  t  for  a  continuous-time 
Markov  failure  model: 


R(t)  =  1  -  p1(t)  pQ(t)  (2-11) 

Assuming  stationary  transition  probabilities, 

pQ(t  +  it)  *=  P  q  ( t )  •  pQ0  =  p0(t)(l  -  qQ1*Av)  (2-12) 
pQ ( t+ At )  =  pQ(t)  -  p0(t)qpi*At 
P0(t+At)  -  Fq  ( t )  -•  -pQ(t  )qQ1*  At 


i .  e 


(  +-  4.  ■ 

>  f\  \  V  1  m 
0 


4- 

^  l 


Po 


(t) 


At 


-P0(t)qoi 


Using  the  definition  of  the  derivatives  of  a  function, 

Pg(t)  «  -qQ1  •  P0(t)  (2-13) 

Laplace  transforming  of  equation  (2-13) 


v  '"0  '  " 


*01 


u 


0 


Assuming  pQ(tQ)  =  1  and  solving  for  pQ(s) 


PyCs)  = 


s  +  q 


01 


(2- HO 


and 

L  *[p0(s)]  =  pQ(t)  =  R(t)  =  e  Ui  (2-15) 

for  the  corresponding  discrete-time  failure  model 

R(K)  =  pQ(X)  =  p°(p)K  (2-16) 

assuming  stationary  p..  and  p~(0)  =  1 

.i.  J  O 

R(K)  =  (1  -  qQ1-At)K  (2-17) 

The  expression  for  the  reliability  of  the  three-state  system 
described  by  the  failure  model  in  Figure  2  differs  from  that 
cf  the  two-state  model  in  that  the  number  of  different  paths 
resulting  in  system  failure  is  increased.  When  the  time 
parameter  of  the  model  is  continuous. 


R(t)  =  p q ( t )  +  p1(t)  =  1  -  p0(t)  (2-18) 

The  probability  p^d)  is  determined  as  follows:  from  Figure 
2  and  the  definition  of  p^(At)  (Ref  19) 

■ 

q00Ut)  q01(At)  q02(At) 

l',P11(At)|l  =  0  qil(At)  q12(At) 

0  0  1 

a 

Then 
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Taking 


Pq ( t+At ) 
p 1 (t+At ) 
P2 (t  +  At ) 


derivatives , 


=  P0(t)q00.At 

■  pQ(t)[l  -  (q01«at  +  qQ2*At)] 

=  p0(t)--1>At  +  p1(t)q11*At 

P0(t;q0l'At  +  pl^t';(‘i  “  q12*At) 

=  pQ(t)q02'At  +  p, (t)q^p*At  +  p2(t) 

(2-19) 

pQa)  =  ~(q01  +  q02)p0(t) 
p^(t)  =  qQ3Pc(t)  -  q12px(t) 

P2ft)  =  qQ2p0(t)  +  Q12?l(t)  (2-20) 


Laplace  transforming  equation  (2-20)  and  assuming  p  (t  )  = 
and  *  P2(t0)  =  0,  then 

(s  +  qQ1  +  q02)p0(s)  =  1 

q0lP0(s)  -  (s  +  ri  ’  2  '  p  q  (  s  '  *  0 

q02P0(s)  +  q12p1(s)  -  sp,(s)  *  0  (2-21) 

By  Cramer's  Rule, 


+  q01  +  q02 

0 

1 

qo  1 

-(s 

+  q  12 ) 

0 

O 

l\) 

q  12 

0 

+  q01  +  q02 

0 

0 

q0 1 

-  ( s 

+  q  12  } 

0 

q02 

q  12 

-s 

12 


(2-22) 
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Taking  the  coefficients 

,  of  s2  in  both  sides 

0  =  As2  +  Bs2  +  Cs2 

*  o  fl 

)  • 

l  +  B  +  C  *  u 

SO  C 

:  =  -(A  +  B) 

C  *  - 

L  qoi 

q0 1  *  q02  -  q 

12 

L 

q  12  "  q02 

q01  +  q02  ~  q 12 

q 12  q02 

q 12  “  (q01  +  q02 ' 

4  6 

-  *  c  •  > 

A  =  1,  E  * 


(q01  +  q02  "  q12} 


“q12  qG2 


q 12  "  ^q01  +  q02} 

( 2-23 ) 


f  \  1  t  ('q01/q01  +  q02  '  q12} 

p2(s)  =  —  + - 


S  +  q 
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+  (~(g-i2  f  q02;/q12  ~  (q01  *  q02  ^ 

s  +  qQ1  +  q02 

[p~(s)]  =  o„(t)  =  1-3  exp  -  qn_t  -  C  exp  -  (q^.  +  qn„)i 

R(t)  =  l  -  p2(t) 

H(t)  *  B  exp  -  q12t  +  C  exp  -  (qG1  +  qQ2)t  (2-24) 

The  reliability  expression  for  the  discrete-time  model  of 
Figure  2  is 


R(K)  =  Pq(K)  +  ?, ( K )  =  1  -  p ^ ( K ) 
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v.'here  pp(K)  is  the  sum  of  all  possible  paths  that  will  cause 
the  system  to  be  in  state  2  after  the  K-th  step  of  the  pro¬ 
cess 

P2(K)  *  P  p ( 0  )  +  p  x ( 0 ) p  ^ ( K )  +  p0(0)p02(K)  (2-25) 

using  the  transition  matrix  method  (Ref  19) 

P2(K)  *  (P°PK)S  (2-26) 

i.e.,  the  third  (last)  element  in  the  vector  resulting  from 
the  multiplication  of  the  initial-state  probabilities  vector 
and  the  K-th  step  transition  probability  matrix.  The  relia¬ 
bility  expressions  for  more  complex  Markovian  failure  models 
must  be  derived  from  the  individaul  model  in  terms  of  the 
possible  states  and  the  failure  characteristics  of  the  sys¬ 
tem  being  modeled.  The  general  approacn,  however,  is  the 
same  as  shown  above. 

The  Sasic  Markovian  Repair  Model 

The  basic  Markovian  repair  model  (Ref  30:27)  is  shown 
in  Figure  3-  The  system  states  of  this  model  are  identical 
to  the  states  of  the  basic  Markovian  failure  model:  state  0 
if  the  system  is  operating  and  scate  1  if  the  system  has 
failed.  The  transition  event  of  this  model  is  any  repair  that 
places  a  failed  system  in  satisfactory  operating  condition. 
From  the  failed  state  1,  the  system  can  either  be  repaired 
in  t  with  probability  p.n  or  not  repaired  in  at  with  proba- 
oility  c,..  The  repair  process  is  completed  when  the  system 

J.1 
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is  returned  to  operable  condition.  Thus,  P, 


1  the  random 


variable,  time-tc-repair  of  a  particular  system  is  a  function 
of  the  repair  characteristics  (i.e.,  accessibility,  complex- 
ily,  Con:  ijjui’dv  j.O!i  ,  e  vC  .  ^  v;  v^e  o.ovc...  a...-  ..a.—.  u~  w-. 

tics  of  the  repair  activity  (i.e.,  maintenance  procedures, 
non  power,  technical  proficiency,  etc.).  The  transition  prob¬ 
ability,  p-G(t),  is  derived  from  the  probability  density, 
g(t),  of  the  system  time-to-repair.  When  the  repair-time 
characteristics  of  the  system  remain  unchanged  regardless  of 
the  past  history  of  the  system,  the  density,  git',  remains 
unchanged  and  the  repair  model  is  Markovian.  The  limit,  as 
t  -*•  0,  of  the  conditional  probability  that  the  system  will 
be  repaired  in  (t+it)  given  that  it  is  failed  at  t  is  the 
instantaneous  repair  rate,  analogous  to  the  hazard  rate  of 
the  basic  Markovian  failure  model.  To  achieve  stationary 
p,0,  exponential  repair  times  are  often  assumed,  in  which 

'■J 

case,  p.  ~  =  q,n«at,  where  is  a  constant  repair  rate. 


Because  repair  is  the  reverse  process  of  failure, 
except  for  the  nature  and  direction  of  the  transition  events 
as  discussed  above,  the  operation  of  the  basic  Markovian  re¬ 
pair  model  is  similar  to  that  of  the  basic  Markovian  model. 

Application  of  the  Basic  Markovian 

Repair  Model  to  Maintainability  (Ref  30:28).  System 
maintainability,  M(t),  is  the  probability  that  a  failed  sys- 
is  repaired  within  a  specified  time.  In  terms  of  the  basic 
Markovian  repair  model,  Figure  3S  M(t)  is  the  probability 
that  the  syste n  transition  from  state  1  to  state  0  within 
the  time  interval  (t^-t).  For  the  continuous-time  Markovian 
repair  model 

M(t)  =  P0(t)  ■  1  -  P1(t)  (2-27) 

Assuming  p1Q  stationary  and  p1(tQ)  51  1 

-Q1Qt 

M(t)  =  1  -  e  1U  (2-28) 

The  corresponding  expression  for  M(t)  of  the  basic  discrete¬ 
time  repair  model  is 

MOO  =  i  -  p1(k)  =  i  -  (i  -  q10*t)K  (2"29) 

As  it  is  used  above,  and  consistent  with  the  defini¬ 
tion  of  maintainability,  repair  is  corrective  maintenance. 
Each  different  mode  of  failure  of  the  system  requires  a  dif¬ 
ferent  repair  action.  Thus  the  basic  Markovian  repair  model 
can  be  extended  to  describe  the  system  at  any  desired  level 
of  detail.  Figure  ^  shows  an  n— th  extension  of  the  basic 


Specific  Applications  of  Markov  Processes 
to  Reliability  and  Maintainability 

Numerous  extensions  and  combinations  of  the  basic  Mar¬ 
kovian  failure  and  repair  models  were  found  during  the  liter¬ 
ature  search  portion  of  this  study.  Representative  models 
showing  the  applicability  of  Markov  process  techniques  to  the 
estimation  of  system  reliability  and  maintainability  are  pre¬ 
sented  in  this  section.  Specific  attention  is  given  to  Mar¬ 
kov  process  applications  in  redundant  systems  and  maintenance 
systems.  Examples  of  the  combined  failure  and  repair  models 
used  to  determine  the  system  availability  are  also  discussed. 

Redundant  Systems .  Shooman  (Ref  73)  presents  an  exten¬ 
sion  of  the  basic  Markovian  failure  model:  a  two  element  non- 
repairable  system.  Figure  5  shows  the  graph  of  the  system. 

The  author  makes  provision  in  the  model  for  failure  hazards 
which  are  a  function  of  time,  i.e.,  non-homogeneous .  This 
discussion,  however,  considers  only  the  homogeneous  model 
with  constant  failure  hazards,  p^  =  X  ^ . 

The  states  of  the  system  are  defined  as  follows: 
state  G  =  both  elements  operating; 
state  1  *  element  1  failed,  element  2  operating; 
state  2  =  element  2  failed,  element  1  operating; 
state  3  =  both  elements  failed; 
and  the  transitions  may  be  described  as  follows: 

X1  =  the  failure  rate  associated  with  the  .wo  states 
in  question 
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Fig.  5 

Graph  of  Two-Element  Non-repairable  System 


X,At  =  the  transition  probability  due  to  element  fail- 

J. 

ure  in  time  interval  At. 

Using  difference-differential  equations  procedures,  the  author 
develops  the  following  state  probability  expressions: 


P  Q  (  t  +  A  t  ) 

-  n  -  (x:+x2 

)At]p0 (t ) 

(2-31) 

P1(t+at ) 

»  [X.,At]p0(t) 

+  [1  -  X^At  ]o.,  (t  ) 

(2-32) 

P2(t  +  At ) 

=  [\2At]pQ(t) 

+  [  1  -  XliAt]p2(t  ) 

(2-33) 

p,(t+At ) 

=  [X  At]p1(t) 

+  [X^AtlPpCt)  +  lp,(t) 

(2-3*0 
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Th  '  solutions  to  these  equations,  with  initial  conditions 
Pq  ( 0 )  =  1  and  p,(0)  =  p2  ( 0 )  =  p^(0)  =  0,  are 

-(\n  + ) t 

pQ ( t )  =  e  1  ^  (2-35) 


?i(t)  = 


^  T  A  2  ^  3 

X„ 


Po(t)  = 


X1+  X2  -  X4 


-A,t  -(X1  +  X2)t 
e  ^  -  e 


-A^t  -(X  +A2)t 

e  -  e 


(2-36) 


(2-37) 


P3(t)  =  1  -  fPg ( t )  +  p1(t)  +  p2(t)]  (2-38) 

For  a  two-element  redundant  (parallel)  system,  one  failure 
can  be  tolerated;  and  hence,  there  are  three  successful  states 
pQ(t),  p-^t),  and  p2(t).  Since  these  states  are  mutually 
exclusive,  the  expression  for  system  reliability  is 


R(t)  «  PqU)  +  P_i(t)  +  P2(t) 


or 


“(x.  +  x-)t 

R(t )  *  e  1  + 


x1  +  x2  -  X3 


-X,t  -(X.  +  X2)t 
e  ~  q 


X1  +  X2  "  x4 


-A^t  -  (  A  .  +  X2)t 
e  -  e  1  c 


-I  (2-39) 


The  author  points  out  the  complete  generality  of  equa¬ 
tion  (2-39)  and  its  usefullness  in  determining  the  reliabil¬ 
ity  of  any  two-element  redundant  system  with  constant  hazard 
elements  (Ref  73:235).  For  example,  if  the  hazard  functions 
are  the  same  regardless  0:  the  state  of  the  system,  i.e.. 
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X -  =  X,.  and  X _  =  X,}  then 
1  4  2  3 

-x ,  t  -x0t  -(x1  +  x_)t 
R(t)  =e  1  +  e  *  -e  1  (2-40) 


If  the  elements  are  identical  and  the  failure  rate  with  both 
operating  is  Xfa  and  with  a  single  element  operating  Xg,  then 
Xx  =  X2  =  X^  and  x3  =  X^  «  Xg  which  yields 


R  ( t ) 


-X  t  - 2  X .  t 

2X.  e  s  -  x  e  b 
b  s 


2 


s 


(2-41) 


Finally,  if  Xg  *  xfa  =  x,  then 

R(t)  =  e“Xt(2  -  e"Xt)  (2-42) 


(Ref  43:217)  shows  that  the  expected  time  to  system  failure, 
found  by  integrating  equation  (2-40)  over  the  range  of  t,  is 


X,' 


(t) 


x1+x2 


(2-43) 


when  all  the  units  have  the  same  failure  rate  X.  Then  equa¬ 
tion  (2-43)  gives  the  mean  time  to  failure  for  a  two  compo¬ 
nent  system  as 

V'1  -  -f - brm  3/2' 


for  a  three  component  system,  we  have 

E„ (t )  =  11/6X 

Co 

and  in  general  for  n  component  system,  the  mean  time  to  fail¬ 
ure  is 
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(2-44) 


n 

E  ( t )  =  l  6/i 
s  i-1 


where  6  =  — .  Here  it  can  be  seen  that  the  marginal  gain 
in  the  mean  time  to  system  failure  decreases  with  each  sub¬ 
system  added. 

If  equation  (2-39)  represents  a  two-element  standby 
system  initially  in  state  zero,  then  =  X A  and  =  0 
since  element  2  cannot  fail  prior  to  failure  of  element  1 
because  it  is  an  unenergized  state.  Thus  X^  =  x^  and  X^  need 
not  be  specified  since  if  Pq(0)  a  1  and  X„  =  0,  state  2  has 
a  probability  which  Is  zero.  With  these  substitutions,  equa¬ 
tion  (2-39)  becomes 


R(t) 


XAe 


-xBt 


XBe 


-xAt 


XA"XB 


XA~XB 


(2-45) 


In  a  similar  application,  Zorger  (Ref  91)  computes  the  relia¬ 
bility  of  a  "maintained"  or  repairable  two-unit  system.  In 
this  model,  the  author  assumes  the  failure  rates  are  the  same 
regardless  of  the  state  of  the  system,  e.g.,  *  X^  and 

X g  =  X^.  The  repair  rate  of  component  i,  multiplied  by 
the  time  interval  At  gives  the  probability  of  transition  due 
to  a  repair.  The  graph  of  this  system  is  shown  in  Figure  6. 

Due  to  the  fact  that  At  is  considered  infinitesimal, 
the  probability  of  a  double  transition  in  At  is  understood 
to  be  zero.  System  reliability  is  computed  on  the  basis  of 
the  first  time  state  3  is  reached;  subsequent  system  repair 
has  no  effect:  on  the  reliability  of  the  system.  Hence, 
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Graph  of  Two-Element  Repairable  System 


p^(t)  and  p,^(t)  are  also  considered  to  be  zero.  The  author 
shows  that  the  reliability  of  the  repairable  system  Is 

R(t)  -  1  -  L-1p0(s)  (2-46) 

where 

p 0 ( s )  *  isp g (s)/a(s) 

A3P0(s)  —  +  X  ^  +  X?  +  tij  +  yj] 

+  *2A22*-S*  +  s'2xi  +  ^2  + 

+  XT  +  X, X^  +  y^X .  +  y „ X  ,, ] 


+  X^A^[sa  +  s(X.  +  2X2  +  Ui)  +  X2 
+  XXX2  +  yxX2  +  y1X1]  +  A0Q[s3  + 

+  s2(2X^  +  2X2  +  u  +  P2^  +  S(3X.,X2 

+  ^2^2  +  yl>'l  +  *1  +  X\  +  y2Xl  +  W1X2 
+  P-j.u2^  +  ^i^2  +  ^1^2  +  u2^1^2 

+  p  x^  (2-47) 

With  i  representing  the  initial  condition  p^(tg)  and 

is  =  -s  |(s  +  X^  +  u2)(s  +  X2  +  y1)(s  +  X1  +  X2) 

-  u2X2 ( s  +  X2  +  un)  -  u  1X1(  s  +  x^  +  u2)|(2-48) 

letting  Aqo  *  1,  k11  *  1,  A22  “0,  u1  *  p2  «*  u  and  X^  =  x?  = 
X,  the  above  expressions  are  greatly  simplified 

.  Bt  n  At 

R(t)  =  — e'  A-  ~  |e -  (2-49) 


where 

A,  B  =  (~3x  *  u)  ±  +  6uX  *  1)2  (2-50 ) 

using  equation  (2-49),  the  author  shews  that  the  maintained 
system  MTBF  is  (3X  +  y)/(2X2). 

Zorger  makes  no  mention  of  system  maintainability. 

It  can,  however,  be  derived  from  the  repairable  system  model 
by  considering  only  tne  repair  characteristics  of  the  sys¬ 
tem.  Assuming  one  repairman  is  available,  the  system  Is 
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repaired  when  either  clement  1  or  element  2  is  operating. 
Also,  p^g  =  0  due  to  the  Infinitesimal  probability  of  two 
repairs  in  At.  Hence, 

M(t)  =  p2(t)  +  p1(t)  ■  1  -  p^(t)  (2-5D 
By  the  methods  used  above. 


P-  ( t  +  At )  =  p,(t.)(i  -  u1At)Cl  -  u2At) 

=  D,(t)  -  ( y  x  A  t  +  u2At)p3(t)  (2-52) 
p'(t)  =  -(u1  +  u2)p3(t)  (2-53) 

sp3(n)  -  A3?  =  -(y1  +  u2)p3(3)  (2-5*0 

From  the  definition  of  maintainability,  A^^  =  1  and  A22  = 
An1  =  0.  Thus 


sp3(s)  -  1 


-  ( y  2  +  y  p ) p  ^ ( s ) 


> 


(s ) (s  +  y -  +  u2 )  =  1 


P3<'S)  S  +  yx  +  u2 

(2-55) 

- ( u i  +  y-)t 

P3(t)  =  e 

(2-56) 

-(u.  +  y-,  )t 

M(t)  -  1  -  e 

(2-57) 

1  5c)  considers  a  "K-out-of-h" 

redundant 

identical  elements  with  constant 


failure  rates  and  no  repair  capability.  The  system  will 
operate  when  at  least  any  K  of  its  components  are  operating. 
The  states  of  the  system  are  defined  to  be  the  number  of 
failed  components;  thus,  the  number  of  system  states,  n  +  1, 
increases  linearly  with  the  number  of  components,  n,  in  the 
structure.  The  component  failure  rate  in  the  i-th  state 
(i.e.,  when  i  failures  have  occurred)  is  denoted  by  x^.  The 
graph  or  transition  diagram  for  this  system  is  shown  in 
Figure  7.  The  reliability  of  the  K-out-of-n  structure  is 
computed  by  summing  the  probabilities  of  successful  system 
states  0, 1,2, . . . ,n-k  ,  i.e., 

n-K 

RCt)  *  i  p. (t)  ( 2-58 ) 

i*0  1 


The  author  points  out  that  considerable  computational  effort 
can  be  saved  (especially  if  K  is  near  n)  by  grouping  all  of 
the  system  failed  states  into  a  single  collective  failed 
state.  (This  is  essentially  equivalent  to  partitioning  the 
transition  matrix  into  successful  and  unsuccessful  states 
as  considered  by  Sandler  (Ref  J'<:  100).)  This  grouping  is 
permissible  because  the  individual  probabilities  of  being 
in  the  failed  states  are  of  no  interest  in  calculating 
system  reliability.  Once  the  system  enters  a  failed  state, 
it  cannot,  without  repair,  return  to  a  good  state.  Thus, 
the  system  can  be  modeled  by  n  -  X  +  1  good  states  and  one 
collective  failed  state,  a  total  of  n  -  K  +  2  stares.  Using 
a  collective  failed  state,  the  transition  diagram  would 
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appear  as  in  Figure  7  with  state  (n-1)  relabeled  as  (n-K) 
and  state  (n)  relabeled  as  (?)  for  the  failed  state.  The 
transition  probabilities  would  also  require  appropriate 
modification. 

In  the  special  case  where  the  hazard  rate  is  assumed 
proportional  to  stress,  i.e., 

*1  "  n  -  !  x0  for  i  =  1 , 2 , . . . ,n- 1  (2-59) 

and  the  components  are  all  assumed  initially  operational, 
the  author  shows  that  the  system  reliability  is 

R(t)  =  e'nXt  niK  ^  ■  (2-60) 

i  =  0  . 


Reliability  production  formulas  for  standby  redundant  sys¬ 
tems  have  been  developed  by  numerous  authors.  Equation 
(2-^5)  is  an  example  of  the  reliability  expression  for  a 
two-element  standby  system.  In  general,  the  reliability  of 
a  structure  composed  of  N  identical  elements,  of  which  N-1 
are  in  stanby  (assuming  exponential  distributed  failure  times 
and  off-line  failure  rate  equal  to  zero)  is  given  by 


R(t ) 


e 


-Xt 


N 

Z 

J=0 


(xt)J 

J1 


(2-61) 


Benning  (Ref  9)  employs  Markovian  techniques  to 
derive  the  reliability  expression  for  a  structure  involving 


N  identical  components  of  which  at  least  K  (K  <  N)  are  re¬ 
quired  for  system  operation  while  N-K  are  in  standby.  The 
author  shows  that 


R(t) 


-KXt 

e 


N-K 

J»0 


(KXt)J 

TT 


(2-62) 


and  that  the  mean  life  (ML)  of  this  configuration  is 


ML  = 
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(2-63) 


For  a  similar  K-out-of-N  system  |n  which  all  N  components 
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a"e  on-line,  the  mean  life  is  \ 
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Comparing  equation  (2-63)  and  equation  (2-64)  term-by-term 
shows  that  every  term  in  equation  (2-64)  is  less  than  or 
equal  to  each  term  in  equation  (2-63).  This  result,  as  the 
author  explains,  shows  that  the  K-out-of-N  standby  structure 
is  more  reliable  than  the  on-line  structure  providing  switch 
ing  is  perfectly  reliable. 

Kapur  (Ref  43:221)  shows  that  for  the  perfect  switch- 
ing  case  R”(t)  -  tne  reliability  function  for  a  standby  sys¬ 
tem  that  has  n  subsystems  -  is 

R?(t)  =  e"Xt  V  ( xt ) i/i 1  (2-65) 

s  i-0 


where 


is  the  failure  rate  and  is  identical  for  each  standb 
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unit.  However,  for  larger  k,  the  design  of  switching  cir- 
cuitary  would  be  quite  complex  if  it  were  done  automatically. 
Hence,  it  appears  that  standby  redundancy  would  be  most  use¬ 
ful  in  those  applications  in  which  manual  switching  or  re¬ 
placement  of  components  would  be  tolerated  (Ref  13:137). 

Epstein  and  Weinstock  (Ref  28)  consider  a  similar  sys¬ 
tem  composed  of  N  units  of  which  m  are  on-line  and  N-n  are 
off-line  (standby).  Of  the  m  on-line  units,  at  least  K 
must  be  operating  properly  except  for  an  "acceptable  down¬ 
time"  which  is  less  than  or  equal  to  t^  units  of  time.  In 
other  words,  even  if  less  than  K  units  are  operational,  the 
system  is  still  considered  to  be  performing  satisfactorily 
as  long  as  this  condition  persists  for  less  than  time  tQ. 

It  is  assumed  that  each  element  of  the  system  has  indepen¬ 
dent  exponential  failure  and  repair  times  and  that  there  are 
N  available  repairmen.  The  authors  consider  the  system  to 
be  in  either  one  of  two  states,  the  good  state  or  the  bad 
state.  If  the  system  has  had  at  least  one  failure  (less  than 
X  units  operational)  for  longer  than  tg  prior  to  the  time 
in  question,  it  Is  considered  to  be  in  a  bad  state.  "Con¬ 
ditional  availability"  is  defined  by  the  authors  as  the  prob¬ 
ability  of  being  in  a  particular  state  at  time  t  conditioned 
on  the  fact  that  the  system  is  not  in  a  bad  state  at  time  t. 

otrinivason  (P.ef  8q)  applies  a  Markov  model  to  a  stand¬ 
by  redundant  system  where  the  switchover  is  not  instantaneous. 
Switchover  time  is  considered  to  be  a  random  variable  and 
is  accumulated  from  the  Instant  action  is  initiated  to  bring 
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the  standby  system  to  the  active  state,  to  the  instant  at 
which  the  standby  unit  becomes  operational.  The  author 
assumes  exponentially  distributed  failure  times,  an  arbi¬ 
trary  repair  distribution,  and  an  arbitrary  distribution  for 
the  probability  that  switchover  is  completed  in  (0,  t). 

After  deriving  a  lengthy  equation  for  the  probability  dis¬ 
tribution  of  first  time  to  failure,  the  author  then  obtains 
an  explicit  expression  for  the  expected  time  to  system  fail¬ 
ure.  His  results  show  that  if  cost  factors  are  of  no  con¬ 
sideration  and  if  the  objective  is  to  have  maximum  MT5F, 

"The  best  policy  is  to  initiate  switching  action  ..."  Just 
at  the  moment  the  subsystem  becomes  standby  (Ref  30:176). 

Kapur  (Ref'  ky.221)  considers  the  case  of  imperfect 
switching.  He  first  locked  at  a  situation  where  the  switch 
simply  fails  to  operate  when  called  upon.  The  probability 
that  the  switch  performs  when  required  is  p„ .  For  the  two- 
unit  standby  system,  it  was  shown  that 
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and  for  a  three-unit  system 


»;<*>'  ■  +  pjp3 


(2-67) 


He  considered  that  the  switch  is  a  complex  piece  of  equip¬ 
ment  and  has  a  constant  failure  rate  of  \  .  Thus  the  re- 

s 

liability  function  for  the  switch  is 

-,\  t 


f 

and  the  switch  can  fail  before  it  is  needed.  When  he  con¬ 
siders  the  two-unit  standby  system,  the  reliability  at  time 
t  is 

R| ( t )*  *  d[ (t1  >  t)(j(t1  s  t  n  tg  >  t^,  t2  >  t  -  t1)] 

(2-68) 

where  t  is  the  random  variable  representing  time  to  switch 
failure.  Figure  S  3hows  the  success  models  for  a  two-unit 
standby  system. 


Fig.  8 

Success  Modes  for  a  Two-Unit  Standby  System  (Ref  143:219) 


Ecuation  (2-68)  becomes 


?.*  ( t )  =  R1(t)  +  /f1(t1)Rg(t1)R2(t  -  t  x )  3 1  (2-69) 
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by  substitution  for  R  (t-) 

w  -L 


t  -x  t 

Rjt)  +  /f1(t1)e  -R2(t  -  t1)3t1  (2-70) 


The  author  considers  the  special  case  where  all  subsystems 
have  a  constant  failure  rate  X,  then  equation  (2-70)  reduces 
to 
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He  finally  considers  the  case  of  three-unit  standby  systems 
which  have  a  constant  failure  rate  X;  he  deduced 
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(2-72) 


Anderson  (Ref  2)  employs  a  Markov  chain  model  to 
analyze  the  reliability  of  a  special  class  of  redundant 
systems.  These  systems  Include  those  which  operate  in  a 
standby  mode  for  a  long  period  of  time  in  anticipation  of 
participation  in  a  single  mission.  The  system  consists  of 
N  modules  of  which  q  are  active  on-line  spares.  The  system 
is  in  operating  condition  when  at  least  N  -  q  modules  are 
operating;  the  system  has  failed  when  more  than  q  modules 
have  failed.  The  following  assumptions  are  made; 

(1)  The  module  rate  (x)  is  constant. 
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(2)  The  module  repair  rate  (u)  is  also  constant  and 
independent  of  the  number  of  failed  modules  (one  module  is 
repaired  at  a  time). 

(3)  Repairs  may  be  made  during  the  standby  interval 
but  not  during  the  mission  Interval. 

(4)  The  system  has  been  in  the  standby  mode  sufficient¬ 
ly  long  to  insure  its  being  in  a  steady-state  condition, 
with  respect  to  reliability,  when  it  enters  the  mission 
interval . 

The  failure  state  diagram  for  this  system  is  shown  in  Figure 
9.  The  state  number  corresponds  to  the  number  of  failed 
modules.  Transitions  from  state  k  to  k  +  1  occur  at  fail¬ 
ure  rate  and  transition  from  state  it  to  k  -  1  occur  at 
repair  rate  u,^. 


Pig.  9 

Failure  State  Diagram  (from  Ref  2:22) 
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The  general  differential  equation  for  this  system  is 
the  same  as  those  developed  by  McGregor  (Ref  57).  Anderson 
manipulates  these  general  equations  to  determine  the  prob¬ 
ability  of  mission  success,  Pms ,  ana  the  expected  downtime 
per  year,  DT/Yr,  for  the  assumptions  and  constraints  (nota¬ 
bly  the  initial  conditions)  of  the  given  system.  Since 
there  is  no  repair  during  the  mission  interval,  the  proba¬ 
bility  of  mission  success  is  simply  the  sum  of  the  individ¬ 
ual  probabilities  of  being  in  an  unfailed  state,  i.e.. 
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The  solution  for  p„(t  )  involves  the  solution  of  the  set 
of  simultaneous  differential  equations  which  apply  during 
the  mission  interval.  The  author  derives  an  explicit  equa¬ 
tion  for  p.,(t  )  which  is  expressed  in  matrix  form  as 
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( 2-80 ) 


DT/yr  *  8760(1  -  pA)  hours/year 

The  author  alee  presents  two  sets  of  curves;  one  set 
shows  the  probability  of  mission  failure  as  a  function  of  n, 
MTBF/MTTR  (mean  time  to  repair)  and  t^/MTEF  fer  different 
values  of  q.  The  second  set  of  curves  give  the  expected  sys¬ 
tem  donwtime  per  year  as  a  function  of  n,  q,  and  MTBF/MTTR. 
These  curves  are  helpful  in  gaining  insight  regarding  the 
tradeoffs  among  the  various  system  parameters.  Several  other 
examples  of  Markov  process  applications  to  the  reliability 
analysis  of  redundant  systems  were  found  in  the  literature. 

In  most  cases,  these  additional  applications  are  extensions 
or  modifications  of  the  examples  already  presented.  The 
interested  reader  may  refer  to  Sandler  (Ref  73)  for  an  exten¬ 
sive  general  discussion  of  the  application  of  Markov  proces¬ 
ses  to  the  reliability  analysis  of  various  system  configura¬ 
tions  . 


Maintenance  Systems .  Brosh  (Ref  16)  utilizes  a  Mar¬ 
kovian  model  to  analyze  a  multi-service  maintenance  system 
with  "first  come,  first  served"  repair  (queue)  discipline. 

The  system  is  inspected  at  fixed  periods  of  time,  classi¬ 
fied  into  one  of  a  finite  number  of  decisions.  The  state 
of  the  system  is  defined  by  the  number  of  machines  in  the 
system,  i  ®  0,1,..., N,  where  N  is  the  maximum  number  of  ma¬ 
chines  the  system  can  contain.  There  is  only  one  maintenance 
crew.  However,  repairs  may  be  performed  in  two  different 
ways.  This  results  in  two  different  probabilities  of 
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completing  maintenance  in  a  unit  of  time.  A  set  of  costs  is 
associated  with  being  in  each  state  and  using  each  type  of 
service.  An  additional  cost  is  incurred  when  a  machine  ar¬ 
rives  for  service  and  the  system  is  full.  The  author  presents 
a  procedure  for  choosing  the  optimal  policy  for  controlling 
the  system  by  deriving  the  steady-state  transition  probabil¬ 
ities.  Maintenance  policies  are  considered  by  partitioning 
the  state  space  into  two  sets  -  one  containing  those  states 
in  which  the  unit  is  maintained  with  service  type  two.  The 
measure  of  effectiveness  is  the  expected  average  cost  per 
observation  period  over  an  indefinite  sequence  of  observa¬ 
tions.  Analysis  of  the  maintenance  policy  space  reveals  the 
interrelationship  between  system  variables  and  optimal  main¬ 
tenance  policies;  e.g.  ,  disconnected  policies  ar«=  dominated 
by  connected  policies  except  for  one  case  (c2)/(ci)  *  (yj/u-^) 
and  ( c ^ / c  ^ )  =  (l)/(u1)  for  which  any  policy  chosen  will  yield 
the  same  value  (c^  (A)/(u-,))  for  the  cost  function  (Ref  16:78). 

Natarajan  (Ref  64)  considers  anotner  important  aspect 
of  the  system  maintenance  problem  -  that  of  assigning  repair 
priority  to  a  particular  type  of  component.  The  author  pre¬ 
sents  a  Markovian  characterization  of  an  anti-aircraft  system 
consisting  of  redundant  radars  working  in  conjunction  with 
redundant  computers.  System  failure  occurs  only  when  both 
radars  or  both  computers  are  in  a  failed  condition.  As  soon 
as  any  component  fails,  repairs  are  initiated.  If,  in  the 
interim,  a  component  of  the  other  parallel  system  fails,  it 
must  either  wait  for  repairs  or  it  can  preempt  tne  unit  m 
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repair.  The  first  case  is  known  as  first-in,  first-out  (FIFO) 
discipline;  the  second  case  is  known  as  preemptive  discipline. 
The  author  assumes  exponential  distributions  for  the  compo¬ 
nent  failure  rates  (x^  and  X9)  and  repair  rates  (y^  and  y^). 

He  derives  somewhat  lengthy  expressions  for  mean  time  to 
system  failure  for  both  the  FIFO  repair  policy  and  the  pre¬ 
emptive  policy.  When  no  repairs  are  permitted,  the  mean  time 
to  system  failure  is 


E(T) 
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The  author  discusses  the  problem  of  optimal  priority 
assignments  to  which  component  ana  what  priority  and  compares 
variations  exhibited  by  the  mean  time  to  system  failure  with 
a  priority  discipline  Imposed  on  the  repair  process  of  dif¬ 
ferent  components.  Numerical  results  provide  the  author  with 
the  following  conclusions; 

(1)  FIFO  discipline  for  repair  of  priority  components 
is  more  effective  than  a  preemptive  discipline  when  X^  >  X^ 
and  u1  >  U2  or  when  X^  <  X^  and  y^  <  y^-  Hence,  the  pre¬ 
emptive  disciplines  should  be  used  only  under  special  circum¬ 
stances  governed  by  emerging  or  risk  considerations. 

(2)  When  a  preemptive  discipline  is  imposed  on  the 

repair  process,  higher  mean  time  to  system  failure  will  re¬ 
sult  if  Xx  >  x0  and  <  y 2  (Ref  64:107). 

As  a  final  example  of  the  use  of  Markov  processes  in 
a  maintenance  environment,  Sckles  (Ref  26)  discusses  a  system 
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that  deteriorates  stocastlcally  and  is  further  complicated 
by  the  fact  that  the  state  of  the  system  is  unobservable 
unless  an  inspection  is  performed.  The  author  develops  a 
model  to  determine  the  appropriate  sequence  of  actions  (re*, 
placements,  repairs,  inspections,  etc.)  that  minimizes  the 
total  cost  of  operation  (including  such  factors  as  downtime, 
inefficiency,  and  salvage  value).  Maintenance  of  the  system 
is  characterized  by  a  discrete-parameter,  non-stationary 
Markov  process.  Prior  to  each  transition,  the  decision  maker 
selects  one  of  a  finite  number  of  available  actions.  The 
action  selected  and  the  system's  age  determine  the  subsequent 
one-step  transition  probabilities  and  the  conditional  (on  the 
system  states)  distribution  of  the  next  measurements.  Cost 
is  dependent  on  the  action  taken  and  on  the  system's  state 
assigned  to  each  possible  transition.  The  author  shows  that 
the  action  that  minimizes  the  discounted  value  of  expected 
immediate  and  future  costs  (assuming  optimum  future  actio;. ?■) 
is  determined  by  the  system's  age  and  the  posterior  distri¬ 
bution  over  the  states  (Ref  26:16).  Optimum  maintenance 
policies  can  then  be  calculated  using  a  dynamic  programming 
method. 


Availability .  The  expression  for  system  availability, 
A(t),  integrates  the  probabilities  of  reliability  and  main¬ 
tainability.  A ( t )  may  be  defined  as  the  probability  that 
a  predicted  percentage  of  operations  of  time  duration  T  will 
not  have  any  malfunctions  which  cannot,  through  maintainability, 
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The  availability  of  the  single  component  is  the  probability 
that  the  system  is  in  state  zero,  i.e.,  A(t)  *  pg(t).  For 
large  t,  the  availability  approaches  a  steady  state  value 
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noting  that  x  =  l/CMTBF)  and  u  =  (1)/(MTTR),  A  can  be  re- 

ss 

written  as 
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(2-86) 


Kapur  (Ref  ^ 3 : 227)  attained  the  same  results  for  the 
availability  as  the  above  equations  by  stating  that  for  the 
exponential  distribution  and  for  some  Interval  of  time  At, 
he  stated 


p[system  failure  during  At]  =  XAt  (2-87) 


and 


p[repair  during  At/system  failure]  =  yAt  (2-88) 

He  defines  the  availability  function,  A(t),  by  using  equa¬ 
tions  (2-87)  and  (2-88)  as  follows 

A ( t  + At )  *  A ( t ) ( 1  -  XAt)  +  [1  -  A (t ) ] y At 

=  A(t)  -  X A(t ) At  +  yAt  -  yA(t ) At 


or 


A  ( t  +  A  t )  -  A  ( t ) 
At 


-(X  +  y)A(t)  +  y 


^3 


'aking  the  limit  as  At  -*•  0 


He  stated  that,  fortunately,  it  so  happens  that  if  the  down¬ 
time,  p.  d.  f.,  is  taken  as  something  other  than  exponential, 
the  same  steady  state  solution  results.  In  practice,  fre¬ 
quently  the  log  normal  is  used  as  the  downtime,  p.  d.  f. 

Kapur  defined  the  intrinsic  availability  ir.  the  same  way  as 
the  availability.  The  only  difference  will  be  that  the  mean 
repair  rate  in  the  availability  function  will  be  replaced  by 
the  mean  active  repair  rate.  Thus  the  steady  state  intrin¬ 
sic  availability  is  given  by 

_ Mean  time  to  failure _ 

rtI  "  Mean  active  repair  time  +  Mean  time  to  failure 


Dayon  (Ref  23)  utilizes  the  steady-state  availability 
concept  to  analyze  a  computer  system  consisting  of  a  data 
processor  and  tape  units.  The  purpose  o±  the  analysis  is 
solve  for  the  MTTR  of  the  redundant  system.  The  author 
points  out  that  defining  the  system  states  and  formulating 
uhe  appropriate  system  steady-state  availability  transition 
rate  diagram  is  the  step  requiring  the  greatest  Q^gree  of 
ingenuity  and  expertise.  By  contrast,  subsequent  steps  to 
obtain  a  numei leal  solution  for  the  system  MTTR  involves 
only  routine  mathematical  manipulations  (Ref  23:153).  The 
svstem  under  discussion  consists  of  a  data  processor  (unit 
% i s  requiring  two  tape  units  (units  #2)  for  data  storage. 

A  third  tape  unit  is  on  standby  redundancy.  A  block  diagram 
of  the  system  is  shown  in  Figure  11.  Implied  in  the  model, 
but  not  explicitly  stated,  are  the  assumptions  of  a  Markovian 
tape  system  with  constant  element  failure  and  repair  rates. 
Other  features  of  the  system  include: 

(1)  All  three  tape  units  are  identical. 

(2)  Any  tape  unit  can  be  removed  off-line  and  repaired 

while  the  system  is  energized. 

(l)  All  units  are  de-energized  when  the  system  enters 

a  fail  state. 

(4)  Repair  is  performed  on  a  FIFO  basis. 

(5)  The  system  is  re-energized  and  operated  as  soon 
as  there  are  enough  units  repaired  to  have  full  system  capa¬ 
bility. 

(6)  There  is  one  repairman. 


Fig.  11 

Data  Processor  (#1)  and  Tape  Units  (#2) 


The  assumed  failure  and  repair  rates  for  the  respec¬ 
tive  units  are: 

Xj  =  0.01/hour  u 1  -  1.0/hour 

=  0. 02/hour  ^2  =  2.0/hour 

The  states  of  the  system  are  defined  as: 

State  0:  all  the  units  up 
State  1:  one  tape  unit  down 
State  f^:  two  tape  units  down 

State  f2:  one  tape  and  the  data  processor  down 
State  f ^ :  data  processor  down 
The  steady  state  availability  transition-rate  diagram 
for  this  system  is  shown  in  Figure  12.  The  loops  showing 
the  probability  of  remaining  in  eac.n  state  have  been  omitted 


to  enhance  clarity.  The  repair  transition  from  state  f,  to 
state  is  indicative  of  the  FIFO  repair  discipline.  If 
the  policy  had  been  to  resume  operation  as  soon  as  possible, 
the  repairman  would  have  ceased  work  on  the  tape  unit  when 
the  system  entered  f2  and  performed  maintenance  on  the  data 
processor  until  the  system  was  returned  to  state  1.  The 
author  develops  the  availability  matrix  and  substitutes  the 
appropriate  failure  and  repair  rates.  Numerical  solution 
yields  (after  "normalizing"  the  state  probabilities)  a 
value  for  MTTR  equal  to  1.46  hours. 
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Grace  (Ref  3*0  discusses  the  evaluation  of  steady-state 
system  availability  when  there  are  a  limited  number  of  re¬ 
pairable  spares  for  the  various  types  of  on-line  units.  The 
particular  system  considered  consists  of  five  on-line  units 
and  two  spares  of  a  given  type,  with  one  maintenance  man 
(associated  with  a  replacement  rate  6)  and  one  repairman 
(associated  with  a  repair  rate  u)  assigned  to  the  unit  type. 
Failure,  repair,  and  replacement  times  are  assumed  to  be 
exponentially  distributed.  Hence,  the  availability  of  i  units 
of  type  a,  Aal,  can  be  determined  from  a  Markov  model  which 
includes  all  possible  states  and  transitions  of  the  on-line 
and  spare  units  of  type  a.  The  author  includes  a  complete 
transition  diagram  (33  states)  and  discusses  a  procedure  for 
writing  the  resulting  equations  directly  from  the  transition 
diagram.  For  this  system,  the  steady-state  probability  n^j 
of  being  in  state  ij  is  obtained  by  solving  the  set  of  equa¬ 
tions  : 


(nx  +  sa)n00  *  un01  +  6^ 
[nx  +  (s  -  I)a]n01  =  yi02  +  5nlj  0 


(2-93) 
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Where  n  is  the  number  of  on-line  positions,  a  is  the  off-line 
(spare)  failure  rate,  and  s  is  the  number  of  spares.  The 
normalizing  condition 

i  =  l  (2-9*0 

i,J  3 

is  required  for  a  unique  solution.  The  steady-state  proba¬ 
bility  that  i  on-line  units  are  not  operating  is  given  by 

rt,  =  £  n,.  (2-95) 

A  j  1J 


If  i  units  have  failed,  the  probability  that  any  1  units  are 
good  is 


P* 


i 


(2-96) 


The  author  shows  that  the  total  probability  that  any  1  par¬ 
ticular  units  are  working  is 


Aa 


i 


(2-97) 


Schick  (Ref  76)  employs  a  Markov  process  to  determine 
the  availability  or  "operational  readiness"  of  a  system  which 
is  subject  to  inspection  and  repair.  Except  for  inspection 
and  repair  periods,  the  system  is  kept  in  a  normal  mode  from 
which  it  is  called  if  its  operation  is  required.  Failure  of  a 
primary  part  causes  immediate  shutdown,  inspection,  and 
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repair.  Failure  during  checkout  is  detectable  only  at  peri¬ 
odic  inspections  while  failure  of  some  equipment  is  liable 
to  two  types  of  failure.  Examples  of  such  systems  include 
missile  launch  facilities  and  computer  repair  systems. 

Grippo  (Ref  37)  also  employs  a  Markov  process  to 
evaluate  the  reliability  and  availability  of  large,  complex 
systems.  The  author's  approach  enables  the  analyst  to  intro¬ 
duce  system  operation  or  maintenance  constraints  without 
adding  appreciably  to  the  complexity  of  the  solution.  Al¬ 
gorithms  are  derived  for  obtaining  computer-aided  transient 
and  steady-state  solutions  to  both  reliability  and  availa¬ 
bility  problems.  The  author  also  presents  a  unique  method 
for  integrating  the  differential  equations  that  can  result 
from  a  Markov  process. 

Sasaki  (Ref  74)  utilizes  implied  Markov  process  tech¬ 
niques  to  develop  a  set  of  charts  and  nomographs  which  can 
be  used  to  evaluate  trade-off  characteristics  between  system 
reliability  and  maintainability.  The  author  presents  a  slide 
methods  (using  the  charts  and  nomographs)  of  achieving  desired 
values  of  system  (mission)  availability  for  duplex  parallel 
redundancy  and  duplex  switch  over  redundancy. 

Finally,  Hevesh  and  h'arrahy  (Ref  38)  discuss  the 
effect  of  failure  on  the  availability  or  "readiness"  of  phased, 
array  radar  systems  under  different  structural  and  repair 
capacity  conditions.  The  author  also  develops  expressions 
for  the  availability  of  parallel  operated  equipment,  whether 
truly  or  quasi-reaundant . 
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III.  Summary  of  Point  Estimates  of 


Availability t  Reliability  of 
Maintained  Systems 

Computation  of  Reliability 
and  Availability 

The  techniques  for  obtaining  the  reliability  and  avail¬ 
ability  of  a  repairable  structure  with  state  dependent  hazards 
and  repair  rates  is  best  illustrated  by  an  example  (Ref  58). 
Consider  the  following  two  components  parallel  system  with: 
components  x^  and  x 2 
components  hazards  X  ^ ,  x0 
repair  rates  Up  u2 
number  of  reoairmen:  1 

In  general,  five  states  are  needed  as  tabulated  in  Table  3.1. 


Table  3.1 

System  States  for  2  Component  Repairable  System 


Good  States 
0  —  xn ,  x 2 

1  -  x  p  x  2 

2  —  x  ^ ,  X  2 


Bad  States 

3  -  Xp  x2  (x^  failed  before  x2) 

4  -  x2,  x1  (x2  failed  before  x^) 


The  transition  diagrams  for  calculating  the  system  reliabil¬ 
ity  and  availability  are  given  in  Figure  13 •  The  distinction 
between  the  availability  and  reliability  transition  diagrams 
is  that  the  availability  transition  diagram  repairs  are 
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(a)  Availability 


lability 


allowed  from  all  system  states,  while  for  the  reliability 
transition  diagram  repairs  are  allowed  only  from  the  good  sys¬ 
tem  states.  Therefore,  summing  the  good  state  probabilities 
obtained  from  the  availability  diagram  gives  the  probability 
that  the  system  is  good  at  time  t,  whila  summing  the  good 
state  probabilities  obtained  from  the  reliability  transition 
diagram  gives  the  probability  that  the  system  has  never  en¬ 
tered  a  failed  state.  System  states  that  distinguish  the  or¬ 
der  in  which  component  failures  occur  are  necessary  for  the 
availability  model  since  there  is  only  one  repairman,  and 
"according  to  the  repair  policy,  he  will  work  on  the  component 
that  fails  first.  If  the  system  is  in  state  number  three  (x^, 
x2),  then  the  repairman  is  working  on  component  and  the 
next  transition  must  be  to  state  number  two  (x^,  x0).  If  the 
system  is  in  state  number  four  (x2,  x^),  then  the  repairman 
is  working  on  component  x2  and  the  next  transition  will  be 
to  state  one  (x.^,  x2).  Ordering  of  the  component  failures 
is  not  necessary  for  the  reliability  transition  diagram  since 
once  the  system  enters  a  state  with  more  ohan  one  failure ,  r.o 
repair  is  attempted. 

The  differential  equations  for  the  state  probabilities 
are  given  for  availability  analysis  by 
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where 


A(t)  =  pQCt)  +  p1(t)  +  p2 (t )  (3.1b) 

and  for  the  reliability  analysis  by 


where 


mt;  =  pQU.)  +  px(t)  +  p2(t)  (3.2b) 

Ordering  would  not  have  been  necessary  if  there  w er« 
two  repairmen.  The  availability  transition  diagram  is  given 

in  Figure  Vi.  The  reliability  transition  diagram  remains 
uncharged . 

If  the  components  are  identical  (x  =  \2  =  X)  and 

(w1  =  U2  =  u)»  thon  states  1  and  2  can  be  combined  and  states 
3  and  4  can  be  combined  resulting  in  a  greatly  simplified 


Availability  Transition  Diagram  for  Two  Component  System 


transition  diagram  as  in  Figure  15- 


It  should  be  noted  that  for  the  case  of  two  repairmen, 
when  one  component  is  failed  one  has  the  option  of  assigning 
only  one  of  the  repairmen  or  assigning  both  repairmen  permit¬ 
ting  joint  effort.  When  a  joint  effort  is  permitted,  the 
repair  rate  y/ -  By.  Sandler  (Ref  73)  uses  3  -  1.5. 

The  differential  equations  for  system  availability 
is,  therefore,  given  by 


C 


p'(t) 

-x'  /  0 

p0(t) 

p'^t) 

■ 

x'  -(X  +  U*)  u' 

P1(t ) 

p'2(t) 

0  X  -u" 

P2(t ) 

(3-3a) 


where 


A(t)  ■  pQ(t )  +  px(t ) 


( 3—  3b ) 


and  the  differential  equations  for  a  system  reliability  is 
given  by 


Po(t) 

-x'  v'  0 

p0(t) 

p^t) 

= 

X7  -(X  +  i/)  0 

o,  (t ) 

X 

p'2(t) 

0  X  0 

P2(t) 

•  — 

^  — 

—  - 

where 

R(t)  *  pQ(t)  +  p^t)  (3-^b) 


Equation  (3-3)  is  readily  solved  for  the  system  availability 
yielding 


A  ( t )  =  1 


xx' 

r  i r 

X 


2 


(3-5) 


where  r^  and  r^  are  the  roots  of  the  equation 

r 2  +  (X  +  +  u  ‘  +  y  * )  r  +  ( x  \'  +  X p  f,+  mu*)  ~  0 


where 

r.  +  r^  *  -(x  +  x/ +  u ‘  +  u ") 

r^r0  »  xx  +  x,y "  +  mm*  (3-6) 
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The  steady  state  availability  is  given  by 


^  A 


ss 


lim 
t  -*■ 


n  U  J  j.  -  — . 


XX 


T2 


,/  //  .  /  // 

Xu  +  uu _ 

:  ~  7  77  i  ?  77 

X  X  +  Xy  +  yy 


(3-7) 


Equation  (3-4)  is  readily  solved  for  system  reliability 
yielding 


R(t)  = 


r3ru 


r3e 


V 


-  r'ae 


V 


(3-8) 


where  r_  and  r,,  are  the  solutions  of 


r2  +  (X  +  x'  +  uOr  +  xx'  =  0 


(3-9) 


The  roots  r,  and  ra  are  clearly  negative  real  since  the  dis- 

.J  ^ 

criminant  of  ecuation  (3-9)  satisfies 


b 2  -  4ac  =  (X  +  x/  +  yO 2  -  4  x  X^ 

=  (X  -  \')2  +  2  y  ( X  +  X)  +  u/2  >  0 


Shooman  (Ref  73)  gave  the  solution  for  the  two  identl- 
c&x  uardl  1c]  dements  cificL  K  I’SDcii.r'rnsr*.  H0  dsi’ivsci  fol- 

loving  equations  with  respect  to  Figure  15 .  The  differential 
equations  associated  with  Figure  16  are 


ps0(-t^  +  'vps0<'t'  ”  ypsl(t) 

D  \  (  t  )  +  (  y  '  +  x  )  P  (  t  )  =  X'  D  „  0  (  t  ) 
■  Sj.  il  so 


Ps2(t)  =  xpsl(t) 


( 3- 1C) 


P  -(0)  =  1  ,  p  -O)  =  p~(0)  0 

b  V  b  -  O 
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*-  \  S'  k'  i  „,r>  . 

-  — -  mi*'  *.  ■  fl 


~  2X  for  an  ordinary  system 
\  ■  X  for  a  standby  system 
u'  *  u  for  one  repairman 


(s  +  x')pg0(s)  -  utsl(s)  =  1 

-X'ps0(s)  +  (S  +  y'  +  X)psl(s)  =  0(3-11) 

-Xpgl(s)  +  sp^Cs)  =  0 

Solving  the  set  of  equations  of  (3-11)  using  Cramer's  Rule 
yields 


Solution  for  the  roots  of  the  denominator  quadratic 


s 
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p5l(t) 


rlt 


r2t 


rl  ’  r2 


rl  '  r2 


ps2(t)  =  1  + 


*1* 


r2t 


rl  -  r2 


rl  "  r2 


(3-19) 

(3-20) 


From  equation  (3-13)  r^  and  r2  are  always  negative  real  num¬ 
bers;  therefore,  the  time  functions  are  decaying  exponentials. 

For  a  system  composed  of  two  series  elements,  the  system  re- 

-2Xt 

liability  is  unaffected  by  repair,  and  R(t;  *  e  ,  which 
can  be  obtained  by  setting  v  =  0  in  the  expression  for  p  Q(t). 
For  a  parallel  system  (standby  or  ordinary),  the  reliability 
is  given  by  psQ(t)  +  pg^(t).  By  appropriately  choosing  coef¬ 
ficients  and  u*  (as  given  in  Table  3-2),  a  large  number 

of  systems  can  be  modeled. 


Table  3-2  Coefficients  nf_  u"  (Ref  58) 


Repair/ Non-repair able 

Type 

Repair  Crew 

x' 

/ 

V, 

y 

non- repairable 

parallel. 

0 

2X 

0 

0 

non-repairable 

standby 

0 

\ 

0 

C 

repairable 

parallel 

1 

2  X 

y 

y 

repairable 

parallel 

2 ,  no 

Joint  effort 

2  X 

p 

2y 

repairable 

parallel 

2, 

joint  effort 

2\ 

By 

2y 

repairable 

standby 

1 

X 

y 

y 

repairable 

standby 

2 ,  no 

joint  effort 

X 

y 

2y 

repairable 

standby 

2, 

joint  effort 

X 

By 

2y 
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IV. 


Confidence  Limits  for  Availabilities 


of  Maintained  Systems 
Exact  Analytical  Methods 

An  estimate  of  system  availabilty  calculated  from 
time-t.o- failure  and  time-to-repair  test  data  will  be  subject 
to  some  degree  of  uncertainty  due  to  the  uncertainty  associ¬ 
ated  with  the  sample  estimates  of  MTTF  and  MTTR.  This  chap¬ 
ter  presents  techniques  for  determining  a  lower  confidence 
limit  on  system  availability  when  time-to-failure  and  time- 
to-repair  are  independent,  exponentially  distributed  vari¬ 
ables.  Also,  the  case  of  log  normally  distributed  repair 
times  will  be  included. 

The  Case  of  Both  Exponential 
Distributions  for  the  Tlme-to- 
Fallure  and  Tlme-to-Repalr 

Mary  Thompson  (Ref  83)  presents  the  techniques  for  de¬ 
termining  a  lower  confidence  limit  by  making  use  of  the  one- 
to-one  correspondence  between  availability  and  the  ratio  of 
mean-time-to-repair  and  mean-time-to-failure  in  order  to  de¬ 
fine  F-distributed  variables  upon  which  the  confidence  limits 
are  based. 

The  availability,  as  defined  before,  is  usually  defined 
as  the  probability  that  the  system  is  operating  satisfactor¬ 
ily  at  any  point  in  time.  This  probability  can  be  expressed 
mathematically  as 


where 


9  =  system  mean-time-to-f allure 
$  =  system  mean-time-to-repair 
The  one-to-one  correspondence  between  availability  $/ 9  is 
obvious.  The  usual  sample  estimate  of  availability  is 

A  *  (4-2) 

9  +  $ 


where  0,  the  sample  estimate  of  0,  is  calculated  from 

n-. 

„  X 

e  =  i  t../n-i  (4-3) 

1=1  11  1 


where 


txl  =  time  between  the  (i  -  l)th  and  the  i-th  failures 
=  number  of  failures 


and  4> ,  the 


sample  estimate  of 


$  =  z  t 

J=i 


is  calculated  from 
2i/n2 


(4-4) 


where 


t^j  =  tlme-to-repair  associated  with  the  j-th  failure 
n£  =  number  of  repair  actions  initiated 
It  is  assumed  that  t^  (tlme-to- failure)  and  t^  (time-to- 
repair)  are  stocastically  independent  random  variables  with 
probability  density  functions 

-t,  /e 

fi(ti)  =  -5-  e  *  (4'5) 
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and 


-t  p/4> 

f2(t2)  =  -J-  e  2  (U-6) 

If  we  consider  a  random  sample  of  n^  times-to-failure 
and  n2  times-to-repair  drawn  from  the  above  populations  with 

A  A 

random  sample  means  9  and  $  calculated  from  equations  (4-3) 
and  (4-4).  It  is  well  known  that  2n^6/e  and  2n2$/$  are  chi- 
square  distributed  variables  with  2n^  and  2n2  degrees  of 
freedom,  respectively.  Since  they  are  independent  due  to 
the  independence  cf  the  variables  t ,  and  t2,  it  is  possible 
to  define  two  new  variables: 


which  is  F-aistributed  with  2n1,  2n2  degrees  of  freedom,  and 


(4-8) 


which  is  F-distributed  with  2n2,  2n^  degrees  of  freedom.  The 
variable  z 1  can  be  used  to  obtain  a  lower  confidence  limit 
for  availability  A  as  follows  (Ref  8?) 

pr  ,  -i~-  s  Px  _  a(2n1,  2n2)  1  =  1  -  a 
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i 


pr 


1  +  ~8~  -  1  +  F1  -  a(2nr  2n2} 


1  -  a 


1  + 


or 


P1  -  a  ^  2ri  1  *  2n2^ 


0  +  4>  ?i  a(2n.,  2n2^ 


/ 

« 

- — 7 - *  *  1  -  a 


i  +  -£- 


S  A  >  *  1  -  a 

(4-9) 


Most  practical  cases  =  n  and  equation  (4-9)  becomes 

=  l-a  (4-10) 


£  A 


9  +  <(>  Fi  a(2n,  2n) 


The  (1  -  a)  lower  confidence  limits  is  found  from 

- . - § -  (4-11) 


LCL  = 


9  +  *  Fi  _  a ( 2n>  2n) 


A  two-sided  (1  -  a)  confidence  interval,  derived  in  a  similar 
manner,  is  given  by 

9 


LCL  * 


9  +  *  ?1  -  a|2(2n>  2n) 

UCL  =  ^  F1  -  a  !  2  ^  2n *  2n'> 

a  F-  i - (2n,  2n)  +  * 

1  —  O  |  c. 


(4-12) 


(4-13) 


Confidence  intervals  calculated  from  equations  (4-11),  (4-12), 
a: 1  (4-13)  cover  the  true  value  of  availability  100  (1  -  a) 
percent  of  the  time.  The  curves  of  the  0.9  and  0.95  lower 
confidence  limits  on  availability  calculated  from  equation 
(4-11)  for  values  (<f>/9)  ranging  from  0  to  0.5  and  for 


t 

j 


i 


for  selected  values  of  n  between  2  and  50  are  given  in  (Ref 
83). 

Suppose  for  example  that  during  the  field  testing  of 
a  communication  system,  five  failures  were  experienced.  The 
average  time-to- failure  9  was  125  hours.  The  average  time- 
to  repair  wa.s  3  hours.  Previous  experience  with  similar 
systems  indicates  that  time-to-failure  and  tlme-to-repair  are 
exponentially  distributed.  Independence  of  the  two  variables 
assumed  (in  practical  terminology,  this  means  that  the  time 
required  to  fix  a  failure  does  not  depend  on  how  long  the 
equipment  operated  prior  to  the  failure)  a  point  estimate  of 
the  system  availability. 


A  =  =  O.Q765 

e  +  <t>  -* 

To  find  the  90  percent  lower  confidence  limit,  compute 


4-  =  ■ 
6 

From  the  curves  given  (Ref 
on  availability  with  $/8  * 
LCL  is  0.95. 


125 


=  0.024 


83),  we  car.  read  directly  the  LCL 
0.024  and  n  =  5;  the  90  percent 


Lower  Confidence  Limits  Assuming  Lognormally 
Distributed  Repair  Times 

Gray  and  Schucany  (Ref  36)  introduced  the  lower  confi¬ 
dence  limits  in  the  case  of  lognormally  distributed  repair 
times  using  the  tables  of  H.  L.  Gray  and  T.  0.  Lewis  (Ref  35). 
If  the  random  variables  x  and  y  denote  the  repair  times  and 
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times  to  failure,  respectively,  since  the  availability  is 
defined  by 


where  u  is  the  mean-time-to-repair  (MTTR)  and  u  is  the  mean 

x  y 

time-between-failures  (MTBF).  It  is  difficult  to  work  ana¬ 
lytically  with  the  assumption  of  lognormal  x  and  exponential 
y  in  trying  to  establish  confidence  limits  on  the  ratio  of 
the  means.  Gray  and  Lewis  (Ref  35)  tabulate  to  some  extent 
the  distribution  of  the  ratio  of  independent  lognormal  and 
chi-square  quantities  for  known  variance  of  the  lognormal 
distribution.  These  tables  enable  us  to  establish  a  lower 
confidence  limit  for  availability  based  on  the  assumption  of 
lognormal  repairs. 

Cray  and  Schucany  ''Ref  36)  derived  the  following  LCL 
for  the  availability  assuming  lognormally  distributed  repair 
times.  If  x  and  y  have  lognormal  and  exponential  distribu¬ 
tion,  the  respective  probability  density  functions  are  given 
by  f 


h(x,  a, 


62) 


exp 


x  >  0 
(4-1=5) 


V.  0 


elsewhere 


‘(y;  u 


r 


exp 


JL 


,  y  >  o 


(4-16) 


0 


elsewhere 


for  the  lognormal  distribution,  the  parameter  a  and  S1  are 
the  mean  and  variance  of  in(x),  respectively;  that  is, 

E[ inx]  =  a 
var[inx]  *  8a 

since 


ux  =  E[x]  =  exp  [a  +  Jti*] 

It  can  be  seen  that  for  a  random  sample  of  size  n  from 
h(x;  a,  6 7 ) ,  the  quantity  defined  by 

Q1  a  x/ea 


where 


(4-17) 


is  the  sample  geometric  mean,  is  distributed  lognormally  with 
parameters  0  and  Ba/n.  Also  for  a  random  sample  size  m  from 
f(y),  it  is  known  that 

'^2  =  2my/y 

where 

y  =  uy  *  E(y)  (4"18) 


is  distributed  as  chi-square  with  2m  degrees  of  freedom.  The 
quantities  and  are  olearly  independent .  Consequently, 
if  we  let 
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Therefore,  the  100C  percent  lower  confidence  limit  (LCL)  is 
given  by 


LCL 


(U~?2) 


_ 2nv£ _ 

2 my  +  bQ  exp[e2/'2]x 

where  oc  Is  obtained  from  the  table?  of  C-ray  and  Lewi?  (Ref 
35). 

Gray  and  Schucany  (Ref  36)  gave  some  figures  which  show 
the  lower  confidence  limits  versus  the  statistic  x/y  for 
several  values  of  m  and  n  for  the  single  value  8*  *  1.0. 

Also,  they  gave  some  figures  for  different  82,  but  these  fig¬ 
ures  are  insensitive  of  the  LCL  to  the  number  of  observed 
failures  m  and  repairs  n  that  comprise  the  statistic.  This 
insensitivity  makes  the  figures  difficult  to  read,  since  for 
any  giver  problem,  it  is  unlikely  that  m,  n,  and  8 2  would 
match  those  figures. 


V.  Mont  Carlo  Comparisons 


Confidence  Limit s  for 
Availability  and  Reliability 

Summary 

Reliability  and  maintainability  engineering  are  recent 
and  related  engineering. disciplines  that  make  extensive  use 
of  mathematical  techniques.  In  analyzing  the  dynamics  of 
physical  systems,  certain  probabilistic  concepts  have  been 
developed  in  order  to  account  for  and  explain  random  obser¬ 
vations  (i.e.,  failures  or  repairs). .  One  of  these  concepts 
embodies  the  theory  of  Markov  processes,  the  idea  that  the 
past  has  no  effect  on  the  future  except  through  the  present. 
The  applicability  of  Markov  process  theory  and  techniques  to 
the  study  of  reliability  and, maintainability  engineering  has 
been  shown  In  earlier  studies. 

A  system  is  designed  to  achieve  a  given  performance 
and  its  quality  is  the  degree  to  which  it  meets  this  perfor¬ 
mance  specification.  Performance  is  normally  specified  in 
terms  of  the  acceptable  limits  of  such  parameters  -as  the 
maximum  permissible  noise  or  the  minimum  acceptable  output 
power  of  a  speech  transmission  channel,  the  maximum  number 
of  lost  calls  or  the  maximum  switching  time  in  a  telephone 
exchange,  the  stability  of  a  pow£r  supply  or  the  frequency 
limits  of  an  oscillator,  and  so  on.  It  follows  that  system 
failure  is  defined  as  a  departure  from  these  specified  limits. 
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Failure  may  be  defined  at  many  levels  whereas  only  a  system 
failure  gives  rise  to  complete  loss  of  system  use.  A  unit 
or  sub-system  failure  may  or  may  not  give  rise  to  a  system 
failure  depending  upon  the  presence,  or  otherwise,  of  redun¬ 
dancy.  Redundancy  is  a  design  configuration,  as  in  the  dupli¬ 
cation  or  triplication  of  units  within  a  control  system, 
whereby  the  failure  of  some  part  of  the  system  does  not  re¬ 
sult  in  a  system  failure.  Reliability  is  frequently  enhanced 
by  the  use  of  redundancy,  but  the  total  number  of  unit  fail¬ 
ures  requiring  repair,  hence  ,the  amount  of  maintenance,  is 
usually  increased  due  to  the  additional  equipment.  Since  the 
availability  is  a  measure  of  the  ratio  of  the  operating  time 
of  the  system  to  the  operating  time  plus  the  downtime,  thus 
it  includes  both  reliability  and  maintainability. 

This  thesis  concentrates  mainly  on  the  confidence  lim¬ 
its  for  the  asymptotic  availability  of  maintained  systems. 
Confidence  limits  for  the  availability  A(t)  and  reliability 
R(t )  of  maintained  systems  may  be  obtained  in  exactly  the 
same  method  as  applied  to  the  two  cases  of  steady  state  studies 
in  this  thesis  as  long  as  equations  for  the  case  have  been 
derived  (see  Chapter  III,  equations  3-5,  3-7,  3-8).  Using 
Table  3.2,  a  large  number  of  systems  can  be  modeled. 
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Results 

Case  1.  Exponential  failure  time 

Exponential  repair  tine 

Number  of  repetitions  =  500 

Mean  time  between  failure  =  100  hours 

Mean  time  to  repair  =  2  hours 

MTBP 

Exact  availability  —  MTBP  +"  M^TR” " 

■-  100/100  +  2  -  .98 


Table  5.1  Results  of  the  Double  Mont  Carlo  Technique 
Exponential  Failure  and  Repair  Times 


Sample 

Size 

No.  of  Trials  * 
per  Repetition 

Actual.  Percentage  Coverage _ 

95* 

.  90* 

85* 

80* 

10 

100 

100 

91.6 

85.8 

80.2 

20 

100 

100 

89.6 

35 

79.2 

30 

100 

100 

89.8 

85 

80.2 

10 

200 

95 

89.2 

• 

84.2 

79.4 

20 

200 

95.6 

92.4 

87.8 

82.2 

30 

200 

95.6 

90.8 

87 

82 

10 

500 

94.4 

"V 

90.4 

85 

80.4 

20 

500 

94.8 

90.6 

84.8 

78 

30 

500 

95.4 

90.8 

86 

82.2 

75 
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Case  2.  Exponential  failure  time 
Lognormally  repair  time 
Number  of  repetitions  =  500 
MTBP  =  100  hours 
MTTR  =  2  hours 
Exact  availability  =  .98 

Table  5.2  Results  of  the  Double  Mont  Carlo  Technique 
Exponential  Failure  Time  and  Lognormally  Repair  Time 


Sample 

No.  of  Trials 

Actual  Percentage  of  Coverage 

Size 

per  Repetition 

~  '  95 i 

9055 

8055 

10 

100 

100 

92.1 

91.3 

87.4 

20 

100 

100 

88.9 

82.3 

81 

30 

100 

100 

91.4 

86.2 

80.5 

10 

200 

96 

92.3 

88.5 

82.3 

20 

200 

99.3 

92.2 

86.1 

82.8 

30 

200 

99.3 

93.6 

88.1 

81.4 

10 

500 

99.3 

93.8 

87.4 

83.6 

20 

500 

99.3 

93.6 

86.7 

82.8  ' 

> 

30 

_ 

500 

>99.3 

92.8 

88.1 

81.5  j 
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We  get  accurate  results  in  the  case  of  exponential  failure 
time  and  exponential  repair  time.  A  set  of  optimistic  re¬ 
sults  was  obtained  in  the  case  of  the  lognormal  repair  time 
with  exact  availability  .98;  that  is,  when  repair  times  were 
correctly  assumed  to  be  logr.ormally  distributed,  the  coverage 
was  greater  than  in  the  situation  of  exponential  repair  time. 
The  same  conclusion  has  been  obtained  by  Gray  and  Schycany 
(Ref  36).  Better  results  car.  be  obtained  with  smaller  exact 
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